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Time and spectral domain relative entropy: 
A new approach to multivariate spectral 

estimation 
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Abstract 

The concept of spectral relative entropy rate is introduced for jointly stationary Gaussian processes. 
Using classical information-theoretic results, we establish a remarkable connection between time and 
spectral domain relative entropy rates. This naturally leads to a new spectral estimation technique where 
a multivariate version of the Itakura-Saito distance is employed. It may be viewed as an extension of 
the approach, called THREE, introduced by Byrnes, Georgiou and Lindquist in 2000 which, in turn, 
followed in the footsteps of the Burg-Jaynes Maximum Entropy Method. Spectral estimation is here 
recast in the form of a constrained spectrum approximation problem where the distance is equal to 
the processes relative entropy rate. The corresponding solution entails a complexity upper bound which 
improves on the one so far available in the multichannel framework. Indeed, it is equal to the one featured 
by THREE in the scalar case. The solution is computed via a globally convergent matricial Newton-type 
algorithm. Simulations suggest the effectiveness of the new technique in tackling multivariate spectral 
estimation tasks, especially in the case of short data records. 

Index Terms 

Multivariate spectral estimation, spectral entropy, convex optimization, maximum entropy, matricial 
Newton method. 

Work partially supported by the Italian Ministry for Education and Research (MIUR) under PRIN grant n. 20085FFI2Z 
"New Algorithms and Applications of System Identification and Adaptive Control" 

A. Ferrante and C. Masiero are with the Dipartimento di Ingegneria dell'Informazione, Universita di Padova, via Gradenigo 
6/B, 35131 Padova, Italy augustoSdei . unipd . it, masiero . chiara@dei . unipd . it 

M. Pavon is with the Dipartimento di Matematica Pura ed Applicata, Universita di Padova, via Trieste 63, 35131 Padova, 
Italy pavon@math . unipd . it 



January 18, 2013 



DRAFT 



DRAFT 



2 



I. Introduction 

Multidimensional spectral estimation is an old and challenging problem ll36ll . Il46ll which 
keeps generating widespread interest in the natural and engineering sciences, see e.g. Il20l , 11241 . 
EH, ll42l . A new approach to scalar spectral estimation called THREE was introduced by 
Byrnes, Georgiou and Lindquist in flU, [fT9ll . It may be viewed as a (considerable) generalization 
of classical Burg-like maximum entropy methods. This estimator permits higher resolution in 
prescribed frequency bands and is particularly competitive in the case of short observation 
records. In this approach, the output covariance of a bank of filters is used to extract information 
on the input power spectrum. A first attempt to generalize this approach to the multichannel 
situation was made in ll42l . where, due to the lack of multidimensional theoretical results, a 
non entropy-like distance was employed in the optimization part of the procedure. The resulting 
solution, however, had higher McMillan degree than in the original scalar THREE method. 

The main contribution of this paper is twofold: On the one hand, we introduce what appears 
to be a most natural multivariate generalization of the THREE method, called RER since the 
metric employed in the optimization problem originates from the relative entropy rate of the two 
processes. The latter may be viewed as a multivariate extension of the classical Itakura-Saito 
distance widely used in signal processing flU. The proposed method features a complexity upper 
bound which, considerably improving on the one so far available, is in fact equal to the one 
featured by THREE in the scalar case. Like all previous THREE-like methods, RER exhibits high 
resolution features and works extremely well, outperforming classical identification methods, in 
the case of short observation records. On the other hand, further cogent support for the choice 
of our distance measure between spectra is provided by a novel information-theoretic result: 
We introduce the concept of spectral entropy rate for stationary Gaussian processes and we 
establish a circular symmetry property of the increments of the process occurring in the spectral 
representation. Then, using classical results of Pinsker [|40Tl . Van den Bos [48], Stoorvogel and 
Van Schuppen [|47l . we prove that the time and spectral domain relative entropy rates are in fact 
equal! This profound result is deferred to the last section of the paper for expository reasons. 

The paper is outlined as follows. Section In] collects basic results on entropy for Gaussian 



vectors and processes. Section III introduces THREE-like spectral estimation methods. Section 



IV presents the new approach RER via a convex optimization problem and derives the form 
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of the optimal spectral estimate. In Section |Vl we establish a nontrivial existence result for the 



dual problem. A globally convergent, matricial Newton-type method is presented in Section VI 



to solve the dual problem. The computational burden is dramatically reduced thanks to various 



nontrivial results of spectral factorization. In Section VII[ both scalar and multivariate examples 



are studied via simulation: the performance of the RER method is compared to that of previously 



available approaches. In Section VIII, some background results on complex Gaussian random 
vectors and on the spectral representation of stationary Gaussian processes are presented. Finally, 
in Section [EXj we introduce the spectral relative entropy rate of Gaussian processes and establish 



a profound connection between time and spectral domain relative entropy rates. 

II. Background on entropy for Gaussian processes 

We collect below some basic concepts and results on entropy of Gaussian random vectors 
and processes that may be found e.g. in ll40l . OTI . 021. The differential entropy H(p) of a 
probability density function p on M. n is defined by 

H{p) — — j \og{p{x))p{x)dx . (1) 

In case of a zero-mean Gaussian density p with nonsingular covariance matrix P, we get 

Hip) = \ log(det P) + l -n (1 + log(2vr)) , (2) 

The relative entropy or Kullback-Leibler pseudo-distance or divergence between two probability 
densities p and q, with the support of p contained in the support of q, is defined by 

B(p\\q):= [ p(x)log?f\dx, (3) 

see e.g [12J. In the case of two zero-mean Gaussian densities p and q with positive definite 
covariance matrices P and Q, respectively, the relative entropy is given by: 

B(p\\q) = l - [logdet(p- 1 g) + tr(g- x P) - n] . (4) 

Consider now a discrete-time Gaussian process {yk, k G Z} taking values in IR m . Let Yi n>n i be 
the random vector obtained by considering the window y_ n ,y_ n+ i, ■ ■ ■ , y ,- ■ ■ , y n -i,y n , and 
let py, n] denote the corresponding joint density. 

Definition 2.1: The (differential) entropy rate of y is defined by 

h r (y) := lim H(p Y , ,), (5) 
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if the limit exists. 

In ll33ll . Kolmogorov established the following important result. 

Theorem 2.1: Let y = {y^, k E Z} be a M m -valued, zero-mean, Gaussian, stationary, purely 

nondeterministic stochastic process with spectral density $ y . Then 

m 1 r w 

h r (y) = - log(27re) + — J logdet S> v (e?*)d#. (6) 

As is well-known, there is also a fundamental connection between the quantity appearing in ([6]) 
and the optimal one-step-ahead predictor: The multivariate Szego- Kolmogorov formula. It is 

det i? = exp | ^- y log det ^(e^dtf j , (7) 

where R is the error covariance matrix corresponding to the optimal predictor. Let y = {y k ; k £ 
Z}, z = {zk, k E Z} be two zero-mean, jointly Gaussian, stationary, purely nondeterministic 
processes taking values in IR m . Let Y[-n,n] and Z[_, t rt ] be defined as above. 
Definition 2.2: The relative entropy rate between y and z is defined by 

BJy\\z) := Km — - — D(W, .\\p z . .) (8) 

rVy|1 ' n-kx>2n+l ^*[-»,n]MJ'-*[-n 1 n] J ' V ' 

if the limit exists. 

The following interesting result holds (see Il47ll . [j3T| ). 

Theorem 2.2: Let y = {y^, k E Z} and z = {z fc ; E Z} be IR m -valued, zero-mean, 
Gaussian, stationary, purely nondeterministic processes with spectral density functions $ y and 
$ 2 , respectively. Assume, moreover, that at least one of the following conditions is satisfied: 

1) Qy&z 1 is bounded; 

2) $ y E L 2 (— 7r, 7r) and $ 2 is coercive (i.e. 3 a > s.t. $ 2 (e j,? ) - a/ m > a.e. on T). 
Then 



1 

47T 



{logdet (^(e^^e**)) + tr [^{eP) ($„{€**) - $ z (e*>))] } d&. (9) 



III. THREE-LIKE ESTIMATION AND GENERALIZED MOMENT PROBLEMS 

We denote by S+ Xm the family of bounded and coercive spectral densities on T := {z E 
C : \z\ — 1} of Revalued processes. Suppose that the data {yi}f =l are generated by an 
unknown, zero-mean, m-dimensional, IR m -valued, purely nondeterministic, stationary, Gaussian 
process y = {y^, k E Z}. We wish to estimate the spectral density $ E S+ Xm of y from 
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{Vi\iLi- A THREE-like approach generalizes Burg-like methods in several ways. The second 
order statistics that are estimated from the data {yi]f =l are not necessarily the covariance lags 
C\ := E{yk+iyJ} of y. Moreover, a prior estimate of $ may be included in the estimation 
procedure. More explicitly these methods hinge on the following four elements: 
1) A rational filter to process the data. The filter has transfer function 



where A G IR" xn is a stability matrix (i.e. it has all its eigenvalues inside the unit circle), 
B e R nxm is full rank, n > m, and (A, B) is a reachable pair; 
2) an estimate based on the data {yt}f =1 of the steady-state covariance £ of the state x(k) 
of the filter 



3) a prior spectral density ^ e <S+ xm ; 

4) an index that measures the distance between two spectral densities. 

The filterbank ( fTT| ) provides Caratheodory or, more generally, Nevanlinna-Pick interpolation data 
for the positive real part $ + of $, see fl5] Section II]. This occurs through the constraint 



which must be satisfied by the spectrum of y (here and throughout the paper, integration — when 
not otherwise specified — is on the unit circle with respect to normalized Lebesgue measure). 
Concerning the spectral density ^: It allows to take into account possible a priori information 
on $, a contingency that is frequent in practice. For example, \& may simply be a coarse estimate 
of the true spectrum^ Since, in general, ^ is not consistent with the interpolation conditions, 
an approximation problem arises. It is then necessary to introduce an adequate distance index. 
This crucial choice is dictated by several requirements. On the one hand, the solution should 
be rational 1 of low McMillan degree at least when the prior ^ is such. On the other hand, 
the variational analysis should lead to a computable solution, typically by solving the dual 
optimization problem. In the scalar case flU, tl27l . the choice was made of minimizing the 

'When no prior information on $ is available, *!/ is set either to the identity or to the sample covariance of the available 
data {yi}iLi- Dually, the prior \& yields a smooth parameterization of solutions with bounded degree which permits tuning. 



G(z) = (zl - A)~ B 



(10) 



x(k + 1) = Ax(k) + By(k); 



(11) 




(12) 
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following Kullback-Leibler type criterion: cIkl(^,&) = / ^log§. This choice features both 
of the above specifications. In the multi variable case, a Kullback-Leibler pseudo-distance may 
also be readily defined [|24l|. inspired by the Umegaki-von Neumann's relative entropy ll39l of 
statistical quantum mechanics. The resulting spectrum approximation problem, however, leads 
to computable solutions of bounded McMillan degree only in the case when the prior spectral 
density has the form ^(z) = tp(z)I, where ij)(z) is a scalar spectral density (yielding the 
maximum entropy solution when = I, ||20l , [0, [1241 ). On the contrary, with the following 
multivariate extension of the Hellinger distance introduced in lfT51 . 

d H (V, $) 2 := inf tr / (W 9 - W*) {W* - W$)* , 

W*,W* J /yj\ 

such that WvW£ = -9 and W*W£ = $, 
which is a bona fide distance, the variational analysis can be carried out leading to a computable 



solution ((13) is just the L 2 -distance between the sets of square spectral factors of the two 
spectra). An effective multivariate THREE-like spectral estimation method can then be based on 
such a distance, leading to rational solutions when the prior \& is rational ll42l . The complexity of 
the solution, however, is usually noticeably higher than in the original scalar THREE approach. 
We show that, employing the relative entropy rate (|9]) as index for the approximation problem, 
the variational analysis can be carried out explicitly. Moreover, such a choice yields an upper 
bound on the complexity of the solution equal to that in the original THREE method. 

Remark 3.1: Notice that finding an input process that is compatible with the estimated co- 
variance and has rational spectrum of prescribed maximum degree turns into a Nevanlinna-Pick 
interpolation problem with bounded degree ll2l. [fT8l . The latter can be viewed as a generalized 
moment problem which is advantageously cast in the frame of various convex optimization 
problems. An example is provided by the covariance extension problem and its generalization, 
see ifTTll . ifTTIl ifTOl BE), 0, lEUl . These problems pose a number of theoretical and computational 
challenges for which we also refer the reader to fl27), J2TJ, [|22|. and 0. Besides signal 
processing, significant applications of this theory are found in modeling and identification 01, 
11291 Ifl4l robust control Q, J2H1, and biomedical engineering ll38Tl . 

Remark 3.2: In spectral estimation, it is important to develop problem- specific criteria for 
choosing a spectral density from a given family satisfying prescribed constraints or to be able to 
compare such spectral densities in an informative, quantitative manner. For instance, in 11231 . Il2~6l . 
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132), it was shown that a geometry entirely analogous to the geometry of the Fisher information 
metric exists for power spectral densities. Moreover, distances between power spectra can be 
used quite effectively in identifying transitions, changes, and affinity between time series or 
even spacial series. Applications include automated phoneme recognition by identifying natural 
transition time markers in speech, which separate segments of maximal spectral separation using 
a suitable metric and variants thereof. Along a similar line, two-dimensional distributions are 
identified on the inside and outside of a curve, and then, the curve is evolved using geometric 
active contours to ensure maximal separation of the spectral content of two regions. This idea 
has been recently applied to visual tracking [|25l . 



IV. A NEW METRIC FOR MULTIVARIATE SPECTRAL ESTIMATION 
Motivated by relation (|9]), we define a new pseudo-distance among spectra in <5™ xm : 

<W*,*) : = {^gdet ($-V"Me j,? )) +tr [HT 1 ^) ($(e^) - *(<*>))]} M. 

(14) 

Further motivation for this distance choice is provided by a profound, information-theoretic 



result relating time and spectral domain relative entropy rates, see Theorem 9.1 below. Notice 
that in the case of scalar spectra, cIrer(&, ^) = l/2dis(&, ^), where 

V ' ; 2tt J_ 7V \^(e^) & #(e^) J 
is the classical Itakura-Saito distance of maximum likelihood estimation for speech processing 
El, H). We now formulate the following Spectrum Approximation Problem: 



Problem 1: Let * G S+ xm , G{z) as in <g0> and £ = S 1 > 0. Find $° that solves: 



minimize d R E R ($,^) over j$ G S™ xm | ^ G$G* = £ j . 



Remark 4.1: Notice that we could also minimize the distance index ( [14] ) with respect to the 
second argument. Indeed, this choice is meaningful in some approximation problems related to 
minimum prediction error and model reduction, see [1331 . In our approximation problem ([T|), 
however , it is possible to prove that such a choice usually leads to a non rational approximant, 
even when the prior \I> is rational. Therefore, this approach is not suitable for our purposes. 

We first address the issue of feasibility of Problem [TJ namely existence of $ G 5™ xm (T) 



satisfying ( 12 ) where G is the transfer function of the bank of filters (11) and £ is the steady-state 
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covariance of the output process. To this aim we first introduce some notation: All through the pa- 
per, Q m C M. mxm denotes the m(m + l)/2-dimensional, real vector space of m-dimensional sym- 
metric matrices. We denote by C^ xm the set of continuous spectral densities of m-dimensional 
]R m -valued processes defined on the unit circle T. We indicate by V (C R ^ m ) the linear space 
generated by C^ xm . Let T : V (C^+ m ) ->> Q n be the linear operator defined by 



T($) := J G&G*. (15) 
The following result can be obtained along the same lines of EQ (see also 03)0 



Theorem 4.1: Consider E = E G W' ixn and a filter defined as in (10). Then: 



1) E is in Range(r) if and only if there exists H G ]R mxn such that 

E- AEA T = BH + H T B T . (16) 



2) Let the E G R nxn be positive definite. Then, there exists H G M mxn that solves (TT6J) if 
and only if there exists $ G such that T($) = E. 

From now on we assume feasibility of Problem [T] In view of the previous result, this is equivalent 



to the fact that Equation (16) admits a solution H. Moreover, to simplify the exposition, we 
assume that E = I. This can be done without loss of generality. In fact, if E ^ I, it suffices to 
replace G with G' := E _1 / 2 G and (A, B) with (A' = E^AE 1 / 2 , B' = E _1 / 2 5) to obtain an 
equivalent problem where E = /. We now proceed to solve Problem [TJ Since 

1 r* m 

— {-tr^- 1 (e^(e^)}d^ = -- (17) 

J — 7V 



the left-hand side of (17) plays no role in the optimization. It can, therefore, be neglected 



together with a \ multiplying the integral. Thus, Problem [I] is equivalent to minimizing, over 
5™ xm , 2d RER ($, *) + m = J {logdet ($- 1 ^) + tr (tf- 1 ^)} , subject to (@. Recall that the 
inner product in Q n is defined by (M, N) = tr[MiV]. We can then consider the Lagrangian 

L*($,A) = 2<W($, *) + m + (A, J G$G* - E) 



log^lll + tr^-^+t^AG^G*; 



trA, (18) 



2 In Eli l the general case was considered when A £ C" xn , B £ £ nxm an( j tne process y is complex-valued, too. In that 
case, it was proven that the Hermitian matrix E £ <£ nxn belongs to Range(r) if and only if there exists H £ C mxn solving 
the feasibility equation E - AY, A* = + H* B* . 
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where the Lagrange parameter A G Q n and we have used the assumption E = /. Notice that 
each A e Q n can be uniquely decomposed as A = Ar + Aj_, where Ar G Range (T) and Aj_ G 
(Range (r)) ± . It can be proven 021 Section III] that, VA ± G (Range (T)) 1 , G*(e^)A ± G( y e^) = 
0. Moreover, tr [A J = (Aj_, I) = 0, because / G Range (r) in view of the feasibility assumption. 



Hence, a term Aj_ G (Range (r)) gives no contribution to the Lagrangian ( fl8[ ) . We therefore 
assume from now on that the Lagrange parameter A belongs to Range (r). 



For A fixed, we consider now the unconstrained minimization of the functional (18) with 



respect to $. Observe that A) in ( fl8| ) is strictly convex on S™ xm . We impose that the first 
variation be zero in each direction G L™ xm . Recalling that, for a positive definite matrix X, 
the directional derivative of logdet(X) in direction SX is given by 

51ogdet(X; SX) = tr(X~ l SX), (19) 

we get: 

5L($, A; 5$) = J [-tr($- 1 <5$) + tr(tf- 1 <?$) + tx{G*AGS$)] = j {-$- l +y- 1 +G*AG,S$}. 

(20) 

Since + tf" 1 + G*AG] G L™ xm , d20j) is zero V<5$ G L™ xm if and only if 



$ = $°(A) : = + G*AG] . (21) 
Let be the stable and minimum phase spectral factor of ^j^jand Gi(e^) be defined by 

G x {j») := G(^)W 9 {e^). (22) 



It will be later interesting to consider also the alternative form of (21 ) 

$°(A) = Wv{I + GIAG^W*. (23) 
It is important to point out that ( |2~Tj ) yields an upper bound on the McMillan degree deg[$°] of 



the optimal approximant $°. Indeed, it follows from ( |2T| ) that deg[$°] < deg^] + 2n, where n 
is the McMillan degree of G(z). This result represents a significant improvement in the frame of 
multivariable spectral estimation, in which the best so far available upper bound on the McMillan 
degree (which can be regarded as a measure of complexity) of the solution was deg[\&] + An 
(see m). 

3 Since *P G 5™ xm , exists. It is unique up to multiplication on the right by a constant orthogonal matrix. 
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Since <j?° is required to be a bounded spectral density, we need, as indicated by ([23]), to restrict 
the Lagrange multiplier A to the subset C + , where 

C + := {A G Q n \I + G\KGi > a.e. on T} . (24) 

In conclusion, the natural set for the Lagrangian multiplier A is 

C r + := C+f] Range (T). (25) 

To sum up, the main result is that for each A G there exists a unique $° G ,S™ xm that 



minimizes the Lagrangian functional. It has the form (21). If we produce a A° s.t. <j?°(A° 



satisfies constraint (12), then such a $°(A°) is the solution of Problem [Tj Existence of such a 
A° turns out to be a most delicate issue. To address this problem, we resort to duality. 

V. The dual problem 



Consider 



inf A) = A) = J log det(J + G* 1 AG 1 ) + n - tr A. 



Instead of maximizing this expression, we will equivalently minimize the following functional 
hereafter referred to as the dual functional: 

MA) :=-£($°(A),A)+7i = J [trA-logdet(/ + G^AGi)]. (26) 

Recall that given a matrix A = A* > 0, we have J log det A = J tr log A. Hence, we can express 
the dual functional also as J^(A) = J tr [A — log(7 + G^AGi)] . Given 5 A G Q„, by means of 



(19) we can evaluate its first variation: 



5 J* (A; 5 A) = VJ*, A ((5A) = j {tr [5 A] - tr [(/ + C7*AC7 1 )" 1 C7* 1 (5Al7 1 ] } . (27) 



The results of this section show that there exists a unique A° G minimizing J* (A) in (26). 



Such a A° annihilates the directional derivative (27) in any direction SA G Q n , namely 



(I -J G^I + G* 1 A°G 1 )- 1 G* 1 ,6A) = V5A G Q n , (28) 

or, equivalently, 

J = J dil + G* 1 A°G l )- 1 G* 1 = J G$ (A°)G*. (29) 

This means that the corresponding spectral density $° := $(A°) = + C7*A°C7] , satisfies 
constraint (12) (recall that we set E = /) and is therefore the unique solution of Problem [T] 
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Uniqueness of the minimizing A G C T + is an obvious consequence of the following result. 

Theorem 5.1: The dual functional Jy(A) belongs to C 2 (£+) and is strictly convex on 

Proof: Consider a sequence M n G Range (r), such that M n — > 0, and define, for N G Q n , 
Q N {z) = I + G\{z)NG{z). By Lemma 5.2 in flU, Ql l +Mn converges uniformly to Q A , so that 
it is bounded above. Hence, applying the bounded convergence theorem, we get 

\im J tr [Q-^GldkGx] = J tr [g A 1 G^AG 1 ] , 

so that J* (A) belongs to C 1 ^ 1 ^). Consider now the second variation. Let us denote the matrix 
inversion operator by R : M M -1 and recall that its first derivative in direction 8M is given 
by 5R(M,SM) = -M l 8MM '. Then, for <JAi and 8A 2 in Q n , we have 

8 2 J*(A; (JA X , 5A 2 ) = y tr [(I + G*AGa) _1 G*5A 2 Gi(J + GJAGi^G^AiGj , (30) 

so that J* (A) is C 2 (£^). The bilinear form H\(-,-) := <5 2 J^(A; •, •) is the Hessian of Ja> at 
A. For 5 A G Range (r), which implies that (A + e8A) G for sufficiently small e, consider 

H A (8A, 5A) = 8 2 J*(A; 8 A, 8A). We get 

H\(8A, 8A) = J tr [(J + G^AGO^G^AG^I + GJAGy^G^AGi] 



(3D 

_L 

tr 



g A i G^AG 1 g A 1 G^AG 1 g, * 



5A 

which vanishes if and only if the integrand is identically zero. Moreover G\8AG\ = WyG*8AGWy 
is identically zero on T if and only if 8 A G Range (T)- 1 . On the other hand we have assumed 
8A G Range(r), so that the integrand is identically zero if and only if 8A = 0. In conclusion, 
the Hessian is positive-definite and the dual functional is strictly convex on £+. ■ 
The next and most delicate step is to prove that, although the set £^ is open and unbounded, 
a A° minimizing over £^ does exist. To this aim, first we prove that the function J$(A) 
is inf-compact, i.e. Va 6 I, the set {A G | J* (A) < a} is compact. To establish this fact, 
define to be the closure of £+, i.e. the set 

CJ+ = {A = A T G R nxn | A G Range(r), / + G\AG X > 0, G T} . 

Given that, for A belonging to the boundary dC T + , the Hermitian matrix / + G\AG\ is singular, 
in at least one point of T, it is useful to introduce the following sequence of functions on C T + : 



J£(A) = J tr 



A - log [ / + G*AGi + -I 
! n 



n > 1. (32) 
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Recall that a real- valued function / is said to be lower semicontinuous at x if, Ve > 0, there 
exists a neighborhood U of x such that, Va; G U, f(x) > f(xo) — e. Recall also that, given 
/ : ]R nxn — > E, its epigraph epi (/) is defined by 

epi (/) := {(x,a) G R nxn x E|a > f(x)} . 

Moreover, / is a lower semicontinuous (convex) function if and only if its epigraph is closed 
(convex), see e.g. [|43l . The following Lemmata allow to conclude that J$(A) is inf-compact 
over £+. 

Lemma 5.1: The pointwise limit J£°(A), defined as Jf^A) := lim^oo J£(A), exists and is a 
lower semicontinuous and convex function defined over C+, with values in the extended reals. 

Proof: The additive term -I ensures that, for each n, Jy{A) is a continuous and convex 
function of A on the closed set From the properties of J^(A), it follows that epi (J^(A)) is 
a closed and convex subset of R nxn x E. In addition, the pointwise sequence is monotonically 
increasing, since J$(A) < J^ +1 (A). Therefore, it converges to J$?(A) := sup n J|>(A). Since the 
intersection of closed sets is closed and the intersection of convex sets is convex, epi J£°(A) = 
n n epi J$(A) is closed and convex. As a consequence, J|°(A) is lower semicontinuous and 
convex. ■ 

Lemma 5.2: Assume that the feasibility condition ( p~6| ) holds. Given A G £+, there exist two 
real constants [i > and a such that: 



tr [A] > /i tr 



a. 



(33) 



Proof: Since E = J, by feasibility, there exists $/ G xm such that / GQfG* = I. Thus, 



tr [A] = tr 



tr 



G*AG$/ 



tr 



W| G* AG W$ 1 $/ 



tr 



(34) 



where the cyclic property of the trace was employed and the auxiliary spectral density H :- 
W^' 1 $/W$* has been defined. By defining a := — tr [J H] , it follows that 



tr [A] = tr 



y (GJAGi + I)E - tr y S = tr y (G*AGi + I)S 



+ a. 



(35) 



Let A be such that (G^AGi + I) = A* A (recall that we are assuming A G so that G^AGi + J 
is positive definite on T and admits a right spectral factor A) so that tr [(G\AGi + 7)S] = 
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tr [ASA*] . Given that H = W^, 1 &jW^ / * is a coercive spectrum, because both $/ and ^ belong 
to xm , there exists /x > s.t. S(e j,? ) > /j7, Ve jl? G T. Recalling that the trace and the integral 
are monotonic functionals, it is possible to conclude that 



tr [A] = tr 



J (G*AGi + 7) ~ 



a > /xtr 



y (G^AGi + 7) 



+ a. 



(36) 



Lemma 5.5: Define B : = {A G | det (GjAGi + 7) = 0, Ve j,? G T} and consider its 
complement set B c := {A G <9£+ \A<£B}. Then, under feasibility assumption: 

1) 7^° (A) is bounded from below on £+; 

2) Jf(A) = J^(A)on£r. 

3) J^(A) is finite over £ c . 

The proof can be found in the Appendix. 

Lemma 5.4: If the feasibility hypothesis holds, then, for A G £+, 



lim J* (A) = +oo. 



(37) 



See the Appendix for the proof. Then, by Weierstrass' Theorem we can conclude that there 
exists a minimum point A° G C v + . More can be proven: 



Theorem 5.2: If the feasibility condition (16) holds, the problem of minimizing J^(A) over 
admits a unique solution A° G £+. 

Proof: Since J* (A) is inf-compact over C+, it admits a minimum point A° there. Obviously, 
A° ^ B, since J* (A) = +oo on B (Lemma 5.3). Suppose A° G B c . By Lemma 5.3 again, it 



follows that J#(A°) is finite. By convexity of C v + , Me G [0, 1], A° +e(I - A°) G £+, since the 
feasibility condition ( fT6] ) ensures that 7 G £+. The one-sided directional derivative is 

"7*(A° + £(7-A ))-7*(A°) 



57m, + (A°;7- A c 



lim 

e\0 



tr [7 - A ] — J tr [(7 + G^Gt^G* (7 - A°) G x ] 
tr [7 - A ] - J tr [(7 + G*A°Gi) _1 {G^d - G*A°Gi 
tr [7 - A ] - J tr [(7 + C?iA Gi) _1 (7 + G{G X ) - I] 



I -I 



(38) 



-oo. 
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The last equality holds because for each A G B c , the matrix I+G\ AGi is singular and I+G\G\ > 
on T. As a consequence, the minimum point cannot belong to dC T + . Thus, A° G £+. ■ 
Finally, we are left with the problem of developing an efficient numerical algorithm to compute 
the optimal solution A°. 

VI. Efficient implementation of a matricial Newton-like algorithm 

In order to compute the minimizer of the dual functional J* (A), a matricial Newton-type 
algorithm is proposed. Here are the main steps: (i) the starting point for the minimizing sequence 
{Aj} ieN is A = 0, (ii) at each step we compute the Newton search direction AAj, (iii) we 
compute the Newton step length t*. 

A. Search Direction 

Even though the problem is finite dimensional, the computation of the search direction is rather 
delicate because a matricial expression of the Hessian and the gradient allowing to compute the 
search direction Ax as Ax = —H x ~ l V f x is not available. In order to compute AAj, given 
Aj G C+, one has to solve, for the unknown AAj, the equation H^AAi, •) = — VJ^a^-), 
which can be explicitly written as: 

J G 1 (I + G* 1 A i G 1 )- 1 G* 1 AA i G 1 (I + G* 1 A i G 1 )- 1 G* 1 = J Gi(I + G{A i G 1 y 1 G* 1 - I. 

To this aim, consider a basis of Range (T). It can be readily obtained, by recalling that 
S fc G Range(r) if and only if 3 H k G R mxn s.t. S fc - AT, k A T = BH k + H k T B T . Therefore, 
considering a basis {Hi, . . . , Hl] for W nxn , a set of generators . . . , S' L } can be found by 
solving L Lyapunov equations. After that a basis . . . , Y>' N } can be easily computed^ Since 
/ G RangeT, we can add to each Sj the matrix a^I, and, for suitable (large) «j, get a basis 
{Si, . . . , Sat} of Range(r) made of positive definite matrices. The search direction can now be 
computed by applying the following procedure: 

1) Compute 

Y = J Gi(I + C7*A l G 1 )" 1 G'* - / (39) 

4 lndeed, following the lines detailed in 1161 . it is possible to obtain directly a basis of Range(F) by solving only iV 
Lyapunov equations. 
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2) For each generator compute 

Y k = J G X {I + GlAiG^GlXkdil + GlAiG^Gl (40) 

3) Find {a k } s.t. Y = Y.k a k Y k\ 

4) Set AAj = J2 k a k T*k- 

The most challenging step is to compute Y and Y k . A sensible approach is to employ spectral 
factorization techniques in order to compute the integrals, along the same lines described in 11421 



Section VI]. Indeed, the integrand that appears in equation (39) is a coercive spectral density and 



the same holds for the integrand in ( |40| ), since we have chosen the generators Ej to be positive 
definite. As a consequence, the integral may be computed by means of numerically robust spectral 
factorization techniques. For the computation of Y, let us focus on Qk^z) = I + G*(z)AiGi(z). 
Assume that a realization of the stable minimum phase spectral factor (z) is given (or has been 
computed from Then, we can easily obtain a state-space realization G x (z) = C x (zl— A X )~ 1 B X 
of G\. Since A,- t E C T + , Qk^z) is positive definite on T, so that the following ARE admits a 
positive definite stabilizing solution P = P T > (see, e.g. Lemma 6.4 in [J42j): 



P = AjPA x - AjPB 1 (BjPB 1 + /) X BjPA x + CjAid. (41) 

Moreover, QkX z ) can be factorized as QkX z ) = A* Ai (z)A Ai (z), where A A .(z) can be explicitly 
written in term of the stabilizing solution P: 

A Ai (z) = (BjPB 1 + I)- 1 ^BjPA 1 (zI - A 1 y 1 B 1 + (BjPB 1 + J) 1 . (42) 

It is now easy to compute a state space realization of A^ 1 and then of the stable filter Wy '■ = 

dAll = C l (zI-Z 1 )- 1 B 1 (BjPB 1 + I)- 1 2, with Z x := A x - B 1 (BjPB 1 + iy 1 BjPA l being 



the closed-loop matrix. The computation of (39) is now immediate. In fact, 

Y + I = J G 1 {I + G* l A i G x y 1 G\ = J dA^A^Gl = J W Y W Y . (43) 

The latter integral is thus the steady-state covariance of the output of the stable filter Wy 
driven by normalized white noise. It can be obtained by computing the unique solution of the 
Lyapunov equation R - Z x RZj = B^BjPBx + Bj and setting Y + 1 = C x RCj , so that 

Y = C x RCj - I. (44) 

A similar procedure may be employed to compute also the matrices Y k . 
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B. Step length 

The backtracking line search is implemented by halving the step ti until both the following 
conditions are satisfied: 

A, + tfAA 2 , eC r + ; (45) 
J*(A» + £f AAj) < Jvt(Ai) + at 4 fc VJ*, Al AAi, where < a< 0.5. (46) 

The first condition can be easily evaluated by testing whether Q A +i t M . admits a factorization 
of the kind introduced in the previous subsection or, equivalently, whether the corresponding 
ARE ^4TJ admits a solution P = P T > 0. 

The only difficulty in checking the second condition is in computing 

J* (A) = tr J [A - log(J + G\AGi)\ = tr A - J logdet(J + GJAGi). (47) 

The evaluation of the latter integral can be attained straightforwardly in the light of the fundamen- 
tal result in statistical filtering (j7]). In our case Q(z) = Qa(z) may be factorized as Q\ = A*A, 
where A is a stable and minimum phase filter for which a minimal realization can be computed 



as in the previous section (see eq. (42)). Since log det Qa = logdet[A*A] = logdet[AA*], 
det R is given by det[A(oo)A*(oo)] which may be explicitly written in terms the solution P of 
the corresponding ARE as det[Bj PB\ + I]. Therefore, 

J log det (J + G\hGi) = log det (BjPB 1 + I) . 

C. Convergence of the Proposed Algorithm 

A sufficient condition for global convergence of the algorithm is that the following require- 
ments are satisfied JH Chapter 9]: 

1) J*(-) is twice continuously differentiable; 

2) A G and the sublevel set S := {A e £+| J* (A) < J^(A )} is closed; 

3) J^(-) is strongly convex, i.e. 3 m s.t. H(J^)(A) > ml, VAeS. 

4) The Hessian is Lipschitz continuous in S, i.e. 3L such that: 



Ha 1 — H A2 



< L 

2 



A 2 - Ai 



VAi,A 2 G S. 

2 



In this case, it is possible to prove not only that the algorithm converges, but also that, after a 
certain number of iterations, the backtracking line search always selects the full step (i.e. t = 1). 
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During the last stage the rate of convergence is quadratic, since there exists a constant C such 
that ||Aj + i — A° || < C||Aj — A°|| 2 . Let us examine the requirements one by one. The continuous 
differentiability of the dual function has already been proven in Section [VJ Theorem 5.2 states 
that the sublevel sets of the dual function Jq, are compact, and hence closed (recall that, in a finite 
dimensional vector space, a set is compact if and only if it is closed and bounded). Moreover, 
it is possible to conclude straightforwardly on strong convexity and Lipschitz continuity of the 
Hessian. Indeed, let us consider the sublevel set 

S= {A G C v + | J*(A) < J*(A )}. 

Notice that, assuming that A is the starting point, the minimizing sequence computed by the 
Newton algorithm with backtracking line search is such that, VA; > 0, A& G S. The continuity 
of the Hessian over £^ has already been proven in Section |vj Moreover, since the map from 
a Hermitian matrix to its minimum eigenvalue is continuous (see Lemma 5.1 in Il42~l0 . the map 
from A G to the minimum eigenvalue of H\(5A,5A) is continuous, being a composition 
of continuous maps. Since S is compact, Weierstrass' Theorem holds. Therefore, there exists 
a minimum m in the set of eigenvalues of the Hessian H\(SA,SA), VA G S. Recall that the 



hypothesis of strict convexity holds (as proven in Theorem 5.1 ). As a consequence, the Hessian 
H\ is a positive definite matrix VA G 5, therefore m > 0. In conclusion, there exists m > 
such that Ha > ml, VA G S, i.e. J* (A) is / strongly convex. Concerning the Lipschitz 
continuity of the Hessian of J* (A), it is easy to see that if a is C 1 (£ 1 ^). Indeed the third variation 
8 3 Jy(A; SAi, SA 2 , SA 3 ) can be explicitly computed and its continuity can be proven along the 
same line developed in the proof of Theorem |5.1| (the result can be extended, leading to the 
conclusion that J* (A) is C°°(£+). Continuous differentiability implies Lipschitz continuity on 
a compact set. Therefore, the Hessian is Lipschitz continuous on S. 

In conclusion, global convergence of the Newton algorithm is guaranteed, so that the proposed 
procedure is an effective computational tool to solve the spectral estimation Problem [TJ 

VII. Simulation Results 
We now employ our results in a spectral estimation procedure, that may be outlined as follows. 
1) We start from a finite sequence {y±, . . . , Un}, extracted from a realization of the zero-mean 
Gaussian process y = {y^; k G Z} with values in IR m , whose spectrum is $(e il? ). 
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2) Design a filter G(z), as described by equation ( [TT| ). 

3) Feed the filter with the data sequence {yx, . . . ,y N }, collect the output data Xi and compute 
a consistent estimate £ of the covariance matrix. 

4) In general, since the data length is finite, the estimate £ does not satisfy the conditions 



stated in Theorem 4. 1 Our choice is to guarantee feasibility, by choosing a positive definite 
covariance matrix £ e Range T such that it is close to the starting estimate £ in a suitable 
sense. Such an approach, introduced in [16], is briefly described in Remark ITT 
5) Choose a prior spectral density \I>. 



6) Tackle Problem [1] by means of the proposed algorithm with the chosen \& and £ = £. 
Remark 7.1: As previously observed, the covariance estimate £ does not usually satisfy the 



feasibility requirements stated by Theorem 4.1 In order to apply our method, we need to 
ensure feasibility. Thus, we need to compute an auxiliary positive definite covariance matrix 



£ which satisfies equation ( [12] ) and it is "close" to the estimate £. To this purpose, an ancillary 
optimization problem is defined, so that the "best" approximant £ is chosen as 

£ = argmin D(p||p), (48) 

Se Range T,S>0 

where p and p are zero-mean Gaussian densities with covariance £ and £, respectively. The 
distance index is namely defined as follows: 

B(p\\p) = i logdet£ _1 £ + trXT 1 £ - n . (49) 

This problem can be solved efficiently by means of a matricial Newton algorithm. The reader 
is referred to [fT6ll for a complete exposition. 

Notice that our approach provides two degrees of freedom, since both the prior ^ and the 
filter G(z) can be suitably selected. Concerning the former, it can be chosen to be a coarse 
estimate $ of $ obtained by means of a standard, "simple" estimation method. For instance, $ 
could be a low order ARMA model, computed through PEM methods. Such a choice gives a 
further insight into the meaning of the proposed procedure: Problem [T] consists in computing the 



bounded and coercive spectral density which is consistent with the interpolation condition < [T2| ) 
and is as close as possible to the initial estimate <f> in the distance @. Consider now the design 
of the filter. Recall that its role is to provide interpolation conditions for the approximant: In 
the light of this consideration, it is possible to reinterpret our approach as a generalization of 
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Fig. 1. Estimation of close spectral lines in colored noise (wi = 0.45 rad/s and u)2 = 0.47 rad/s). The chosen prior is the 
sample covariance of the data {j/fe}. The radius of the complex poles is equal to 0.95. Both RER and THREE indicate the 
presence of the two lines. 



classical problems such as Nevanlinna-Pick interpolation and the covariance extension problem, 
as explained in JU Section I]. 

A. Scalar Case 

To begin with, we analyze the resolution capabilities of the proposed approach, by comparing 
its performances with those of the original THREE method. In [5], it is explained how an 
adequate choice of the filterbank poles can improve the estimate's resolution: a higher resolution 
can be attained by selecting poles in the proximity of the unit circle, with arguments in the 
range of frequency of interest. In order to analyze such a feature, we deal with an instance of 
the classical problem of detecting spectral lines in colored noise. The setting is the same described 
in BH Section IV.B]. The process of interest obeys to the following difference equation: 

y{t) = O.5ffln(wit + 0i) + O.5ffln(w a * + 02) + *(t), z(t) = O.Sz(t- 1) + 0.5u(t) +0.25u(t - 1), 
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where the variables 4>i, 4>2 and v(t) are Gaussian, independent, with zero-mean and unit variance. 
Matrix B is a column of ones. Matrix A was chosen as a block-diagonal matrix; its real 
eigenvalues are 0, 0.85 and —0.85 and there are also five pairs of complex eigenvalues, whose 
arguments are equispaced in a narrow range of frequency where the sinusoids lie. Firstly, the 
spectral lines were fixed in u)\ = 0.42 rad/s and CO2 = 0.53 rad/s, and so the complex poles of 
G(z) were chosen as: 

0.9e ±ja42 , 0.9e ±ja44 , 0.9e ±ja46 , 0.9e ±ja48 , 0.9e ±ja50 . 

By considering the constant prior, equal to the sample covariance of the available data, the 
proposed method was able to detect both lines. We considered then the more challenging task 
when tui = 0.45 rad/s and U2 = 0.47 rad/s. This choice makes the value of the distance between 
the two lines lower than the resolution limit of the periodogram, which amounts to ^ (which in 
our case is ~ 0.021 rad/s). Nevertheless, choosing the poles closer to the unit circle, by fixing 
their radius to 0.95, the RER estimator was still able to detect the presence of two lines. Figure 
[T] compares its performances with those achieved by THREE. In simulations, RER exhibited 
performances that were quite similar or slightly better than those of THREE, as in the case 
which is shown, where the peaks that were estimated are slightly closer to the real position of 
the spectral lines than those indicated by THREE. It was also observed that, in general, bringing 
the poles closer to the unit circle increases both the resolution and the variance of the estimates. 
The same trade-off was first described in [5] and seems to be typical of all THREE-like methods. 

B. Multivariate Case 

In order to test the performances of the proposed method in multivariate spectral estimation, 
we considered the same estimation task that is described in ll42l Section VIII.C]. The process y 
was obtained by filtering a bivariate Gaussian white noise process with zero mean and variance 
/ through a square shaping filter of order 40. The filter coefficients were chosen at random, 
except for one fixed complex poles pair, 0.9e ±ja52 and the zeros pair (1 — 10~ 5 )e ±ja2 . 

We designed the filter G(z) by fixing four complex poles pairs with radius 0.7 and arguments 
equispaced in the range [0,n]. We assumed N = 300 samples of the process {yk}kez to be 
available. As for the the prior, our choice was to compute a simple PEM model of order 
3, by means of the standard function pern provided in Matlab. Figure [2] shows the real 
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spectrum and the estimate computed by the RER approach. We then compared the performance 




9?(Phi12), true spectrum 
9?(Phi12), estimate RER 




Fig. 2. Multivariate spectrum estimation, TV = 300, PEM(3) prior. Comparison between the approximant and the true spectrum. 

of the proposed technique to those achieved by Maximum Entropy ll20l and Hellinger-di stance 
estimators, which are both THREE-like approaches to multivariate spectral estimation. Notice 
that the latter represented the state-of-the art of the convex optimization approach to spectral 
estimation in the multichannel framework [fT51 . In order to make the comparison as independent 
as possible of the specific data set, we performed 50 trials by feeding the shaping filter with 
independent realizations of the input noise process. The criterion to evaluate the performances 
of each method was taken to be the average estimation error at each frequency, defined as 

1 50 

i=i 

Here M denotes the specific algorithm, <& M is the corresponding approximant and the norm is the 
spectral norm (i.e. the largest singular value). Figure [3] allows to compare the various techniques. 
The results achieved by our approach are quite better than those of Maximum Entropy estimator 
(referred to as ME). Our RER method seems also to slightly outperform the Hellinger-di stance 
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Error norm, Hellinger 
Error norm, ME 
Error norm, RER 




Fig. 3. Comparison of THREE-like approaches in terms of average estimation error. N = 300 available data. Both RER and 
Hellinger estimator are provided with a PEM(3) prior. All the considered methods make use of the same filter G(z) 



approach. It is worth noticing that in the Hellinger case the order of the estimates was 19, while 
in the case of RER it was just 11. The order of the ME estimate is equal to 8. 

It is interesting to investigate the case when only a few samples of the process y are available. 
Shortness of the available data record, can heavily affect with artifacts the estimates obtained by 
classical methods such as Matlab's PEM and N4SID. As the other THREE-like approaches, 
the RER method seems to be quite robust with respect to such a problem. Figure [4] shows the 
results obtained in a case where only N = 100 samples are available. Both PEM and N4SID 
estimates are affected by artifacts. On the contrary, the proposed approach is not. This result 
seems to suggest that RER estimation is suitable to tackle spectral estimation issues characterized 
by the presence of short data records of the process of interest. 
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Fig. 4. Comparison of MATLAB's PEM, MATLAB's N4SID and RER in terms of average estimation error. N = 100 available 
samples. The prior considered for RER is a PEM(2) model. The filter G(z) has a pole in the origin and four complex conjugate 
poles pairs with radius 0.7. Notice that RER does not exhibit artifacts, whereas PEM and N4SID do. 



VIII. ON THE SPECTRAL REPRESENTATION OF STATIONARY GAUSSIAN PROCESSES 

We shall need the following result whose proof is given in the Appendix. 

Lemma 8.1: Let u,v be A;-dimensional, real random vectors with probability densities p,q, 
respectively. Let / : R. k — > M. h be measurable and p a , q a be the probability densities of the 
augmented vectors [u T /(w) T ] T and [v T f(v) T ] T , respectively. Then 

B(p a \\q a )=B(p\\q). (51) 

We also need to consider zero-mean, n-dimensional, complex-valued Gaussian random vectors 
z = a+j/3, where the real and imaginary parts are jointly Gaussian. The corresponding density 
function is the joint probability density of the 2n-dimensional compound vector 7 = [a T /3 T ] T . 
The differential entropy of the n-dimensional complex Gaussian density p (defined exactly as 
in ([!]) but with integration taking place over M. 2n ) with zero mean is given by 

Hip) = - [ log(p(x))p(x)dx = I log(det R) + \{2n) (1 + log(27r)) , (52) 
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where R is the covariance matrix of the 2n-dimensional vector 7. Similarly, the relative entropy 
between two zero-mean n-dimensional complex Gaussian densities p and q is given by 

B(p\\q) := l - [logdet^ 1 ^) + tr^" 1 ^) - 2n] , (53) 

where R p and R q are the covariance matrices of the 2n-dimensional vectors 7 P and 7^ corre- 
sponding to the densities p and q, respectively. If the zero-mean, C n -valued Gaussian random 
vector z has the property that E[z,2 T ] = 0, then we say that z is a circular symmetric normally 
distributed random vector [j37l . This implies that E[ao; T ] = E[/3/3 T ]. If p and q are two n- 
dimensional complex Gaussian distribution with circular symmetry, the expression of the relative 
entropy simplifies to the formula 

B(p\\q) = log det(p- 1 Q) + tr(g- x P) - n, (54) 

where P and Q are the covariance matrices of 7 P and jg, respectively. 

We now state a few basic facts about the spectral representation of a stationary process that 
can be found, for instance, in J34[, ll45l . 051 . Let y = {y k \ k e Z} be a IR m - valued, zero-mean, 
Gaussian, stationary process and let C\ := E{y k+ iyJ}, I e Z, be its covariance lags. Then 

Q = ^- Te^dF^), (55) 



— 7T 



2tt 

where F is a bounded, non-negative, matrix-valued measure called spectral measure. The sta- 
tionary process y admits itself the spectral representation 

y k = r^dy(e^), (56) 

J —n 

where y is a m-dimensional stochastic orthogonal measure, see [|45ll . It may be obtained by 
defining, as in [|35l pag. 44], 



if k ^ 

X*(*i,0 a ):=< " 2wJfc , (57) 

^1 if = 



and setting 

N 

y(ei* e>"*):= lim V) XkWiMVk (58) 



fe=-JV 

where the sequence converges in mean square. We use the notation dy(e^) as a short-hand for 
#(ei* ei(*w*)) (with eft? > 0). It is well known that 

E {dy^d^e 1 "*)*} = dF(tf), (59) 
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where * denotes transposition plus conjugation. If the process y is purely nondeterministic, then 
dF($) = ^ y (e^)d'd, where $ y is the spectral density function. 

Proposition 8.1: Suppose i?i,i?2 G ( — 7r, tt], then y(e~^ 2 , e - ^ 1 ) = y{e^, e^ 2 ). If, moreover, 
i?i,i?2 have the same sign, then, {/(e-^ 1 , e^ 2 ) is a circularly symmetric, normally distributed 
random vector. Finally, let #1, $ 2 , #3, $4 be such that $ 2 ]n[$ 3 , i? 4 ] = i? 2 ]n[-i? 4 , -# 3 ] = 0. 
Then, ^(e^ 1 , e^ 2 ) and y(e j,?3 , e^ 4 ) are independent random vectors. 

Proof: Observe that y(e s§1 , e^ 2 ) is a complex- valued random vector that may be written 



as y(e^,^ 2 ) = y r (e^\e^ 2 ) + m{e^, e J '* 2 ). In view of <g§ the real part y r (e^, e^ 2 ) and 
the imaginary part y^e^ 1 , e^ 2 ) are jointly Gaussian real random vectors, so y(e^ x , e-^ 2 ) is a 
complex- valued Gaussian vector. Since is a IR m -valued process, ^(e^ 1 , e^ 2 ) (which may 
be thought of as an integrated version of a "Fourier transform") has the Hermitian symmetry 
or equivalently y(e~^ 2 , e~^' L ) = ^(e^ 1 , e^ 2 ). Moreover, for i?i and # 2 with the same sign, 
yie^ 1 , e^ 2 ) and y(e~ j,?2 , e - ^ 1 ) are orthogonal. Thus, we get 

= E[jKe^\e^ 2 )£(e-^ 2 ,e- j ^)*] = ^[y(e^\e^ 2 )y(e i{>1 ,e^ 2 ) T ] 

or, equivalently, y(e jl?1 , e-^ 2 ) is circularly symmetric normally distributed. Finally, recall that two 
complex Gaussian random vectors v±, t> 2 are independent if and only if E[v\vJ} = E[viV$\ = 
0. In our case, by the orthogonality property, we have E[y(e il91 , e^ 2 )y(e^ 3 , e^ 4 )*} = and 
E[y(e^,^ 2 )y(e^,e^) r ) = E[y(e^, e^ 2 )y(e-^, e~^)*] = 0. 



IX. Spectral relative entropy rate 

Consider two zero-mean, jointly Gaussian, stationary, purely nondeterministic stochastic pro- 
cesses y = {yk~, k E Z} and z = {zk, k E Z} taking values in IR m with spectral representation 

Vh = f ei M dy(<j% E {dy(^)dy(e^)*} = %{e^)d§, (60) 

J —7V 

/7T 
e* M dz(el*), E{dz(j*)dz(j°)*} = <$? z {e^)d§. (61) 
-7T 



January 18, 2013 



DRAFT 



DRAFT 



26 



Let i?i 



, and consider the complex Gaussian random vectors 



y(e^ k ,e^ k+1 ) and 



5z k := z(e^ k , e^ k+1 ), with k — 0, 1 ... , 2n. Define now the random vectors 



= 1 2n, 









5z 


Y k : = 










Syk-i 







(62) 



and denote their joint probability density by p(Y k ) an d p{Z k ), respectively. 

Definition 9.1: The spectral relative entropy rate between between y and z is defined by the 
following limit, provided it exists: 

1 



\{dy\\dz) : = lim — D [p(Y 2n )\\p(Z 2n ] 

n->oo An 



(63) 



We now establish a remarkable connection between time-domain and spectral-domain relative 
entropy rates. 

Theorem 9.1: Let y and z be as above. Assume that both $ y and $ 2 are piecewise continuous, 
coercive spectral densities. The following equality holds: 

)• (64) 



Proof: In view of Proposition 8.1 , the last n components of Y 2n are functions (the complex 
conjugate) of the first n and the same holds for Z 2n . Hence, in view of Lemma 

D (p(Y 2n ) \\p(Z 2n ) 



8.1 



we have 



8.1 



we have that the elements 



^p(Y n ) \\p(Z n ) ) . Using again Proposition 
of Y n are independent random vectors and the same holds for the elements of Z n . Hence, we 
have the following additive decomposition: 



n— 1 



p{Y 2n )\\p{Z 2n ] 



p(Y n )\\p(Z n )) = ^BWyJMSzk)), 



(65) 



fc=0 



with p(8y k ) and p(5z k ) being the probability densities of the random vector 5y k = y(e^ k , e^ fe+1 ) 
and 5z k =z[e ydk , e ydk+1 ), respectively. Since 5y k and 5z k are jointly Gaussian and circularly 



symmetric, by ( |54| ) and (]60])-([6T|), we get, 

B(p(5y k )\\p(5z k )) = logdet [Qy\'&k^k+x)Qz{'&k^i)\ +tr [Q' 1 ^, $ k+ i)Q y ($ k , #k+i)] ~m, 

(66) 

where, by virtue of the orthogonal increments property, 



k+lj 



'fc + 1 



'fe+i 



$,(e j «)o?e 



(67) 
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By piecewise continuity and the mean value theorem, we have that, except for a finite number 

of k's, 



(p(5yk)\\p(5z k )) = logdet 



7T 



n 



7T 



n 



+ tr 



nJ n 



-i 



7T 



logdet[$ 2/ (e i ^)- 1 $ 2 (e j ^)] + tr 



$ < (e 1 * fc )- :L $ v (e i * k ; 



— m 



(68) 



where < § k < d k+ i. By employing the latter expression together with (65) and (63), we get 



1 1 n—l 

(dy\\dz) = lim — B>(Y n \\Z n ) = lim — VD (p(dy k )\\p(5z k )) 

fc=0 



n-1 



lim — lo S det $ J ,(e j * fc ) _1 $ 2 (e j * fc ) + tr ^(e^) ($ y (e j ^) - $ z (e^ 



k=0 
n-1 



fc=0 



lim — V j logdet ^(e^)" 1 ^^) + tr 



$tVN h,,(e^ k ) -$ ? (e j ^ 



7T 
?7 



2tt 



47T 



{logdet (^V)^'*)) + tr [^(e*) (^( ej ") - M^*))] } d-d 
{logdet (^(e^^e*)) +tr [<&jV*) (%(^) - $ z (e^))] } d0, 



which, by (|9]>, is ( |64| ). ■ 
Remark 9.1: As is well known, the fundamental property of the Fourier transform is that it 
is isometric. The above result may be interpreted as a further invariance principle of the Fourier 
transform: the relative entropy rate is the same in the time and spectral domain. 

X. Conclusion 

In this paper, a profound information-theoretic result relating time and spectral domain relative 
entropy rates of stationary Gaussian processes has been established. Motivated by this result, a 
new THREE-like approach to multivariate spectral estimation, called RER, has been introduced 
and tested. It appears as the most natural extension of maximum entropy methods when a prior 
estimate of the spectrum is available. It features an upper bound on the complexity of the 
estimate which is equal to the one provided by THREE in the scalar context, sensibly improving 
on the best one so far available in the multichannel setting with prior estimate. As for previous 
THREE-like methods, RER exhibits high resolution features and works extremely well with 
short observation records outperforming Matlab's PEM and Matlab's N4SID. 
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Appendix 

Proof of Lemma |&7] [41 ]: Recall the variational formula for relative entropy lfT3l : 



!)(p\\ q ) = sup {E[<p(v)} - logE[y (u) ] } 



(69) 



where <p is the set of all measurable and bounded functions cp : IR fc — > R. Consider a measurable 
and bounded function ip : R k — > R. Define <p a ■ R k+h — > R by 



where x' E R h . Obviously, (p a is bounded and measurable, and 

E[<p(v)] - logE[e^ u) ] = E[(p a (v a )] - logE[e<^] < B(p a \\q a ) 



(70) 



(71) 



By taking the supremum, we get that D(p||g) < B)(p a \\q a ). The opposite inequality can be 
proven along the same lines. Indeed, let if> a : R h+k — > M. be a measurable and bounded function. 
Define if) : R k — > R by if)(x) := ip a (x, f(x)). Then, if; is measurable and bounded too, so that 



E[ip a (v a )] - logE[e^ (Ma) ] = E[if)(v)] - logE[e^ (tt) ] < B(p\\q). 
In view of ([69]), we now get O(jo a ||g a ) < B(p||g). 



(72) 



Proof of Lemma 5.3 



1) As a consequence of Lemma [5^2 

^(A) = 



tr 



> / tr 



1 



A - log(/ + GlAd + -I) 



n 



H(I + GJAGi) - log(/ + GJAGi + -/) 



n 



(73) 



a. 



Let {xi} be the eigenvalues of (/ + G^AGi). Then, 



tr 



/i(J + GlAd) - \og(I + C7^Al7i + -/) 



n 



+ a 



^J2 Xi ~J2 lo s( Xi + ~) + a= I 



(74) 



p(xi, . . . ,x m ) + a, 
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where p(x 1} ...,x m ):=p J2T=i x * ~ XX 1 lo S { x i + k) ■ Moreover, 

Vz. 



The minimum of p is thus attained by choosing Xj 



1 _ 1 



Vz. Therefore, 



p(xi, . . .,x m ) > m 



pm 



n 



+ m log p 



The fact that J$(A) is bounded from below over now follows: 



J£(A) > a + 27r[m + mlog/i - — ] > a + 27rm[l + log/i]. 



n 



2) Beppo Levi's Theorem allows to conclude that J£°(A) = J* (A) in C + : 



J~(A) = J tr[A}- J tr 



1 



lim log(J + G{AG 1 + -/) 



n 



MA). 



(75) 



(76) 



3) Since, for A G B c , the rational function det (/ + G\AGi) is not identically zero, its 
logarithm is integrable over (— 7r, 7r]. Hence, J^°(A) is finite. J^°(A) = +oo instead for 
A G £. 



Proof of Lemma 5.4- In view of Lemma |5.2 

tr [A] > p tr 



y (C*AGi + J) 



a > a, 



(77) 



so that tr [A] is bounded from below. Consider a sequence {A k } k N G £+, such that 



lim ||Afc|| = +oo. 

k— >oo 



'-J. Since £^ is convex and A = belongs to V£ G [0, 1], £A G Therefore 



A° G for sufficiently large k. Let 77 := liminf tr [A°] In view of (77) 

1 



trA° 



trA fc > 



7^— n-a -> 0, 
A* 11 



l|Afe|| ii^vjfei, 

for ||A fc || 00, so r] > 0. Thus, the sequence {A°} has a subsequence such that the limit 
of its trace is rj. Given that A° belongs to the surface of the unit ball, which is compact, the 
subsequence contains a subsubsequence {A° m } fc gN that is convergent. Define 

A^ := lim A° . 

The next step is to prove that A^, G C v + . To this aim, notice that A^, is the limit of a convergent 
sequence in the finite-dimensional linear space Range (T). Therefore it belongs to Range (T). 
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Moreover, recall that the primary sequence {A fe } fceN has elements belonging to C T + . It means 
that, for each A k , (I + G\A k Gi) > 0. As a consequence, it holds that, for each m, 

1 

I At. 



/ + G^Al G x > on T . 



Taking the pointwise limit for m — > oo, it results that G^A^Gi is positive semidefinite on T, 
and so (I + G^A^Gi) is strictly positive definite on T. Therefore, A^, G £+. 



The next step is to prove that trA^ > 0. If the feasibility condition (16) holds, there exists 
$/ such that I = J G$/G*. Therefore, it is possible to write: 



trA° =tr / G<j? 7 G*A 



■>* A 

OO 



tr [W^W^A^GW^W^j] 



tr 



tr 



(78) 



where the coercive spectral density E is defined as in Lemma 5.2 Since G*A^Gi > 0, in order 
to prove that tr [A^J is positive, in view of ([V8]) it is sufficient to show that G^A^Gi is not 
identically zero. Assume by contradiction that G^A^Gi = 0. As a consequence, Ve^ G T, 



= GJA^Gi = WZGTA^GWv 



* A 



(79) 



Therefore, G*A^G = 0. However, this means that Aj^, G Range(r)- 1 . But it has already been 
proven that Aj^ G Range(r). Moreover, A^, ^ 0, since it belongs to the surface of the unit 



ball. This is a contradiction. Thus, G^A^Gi is not identically zero, and from (78) it follows 
that r\ = tr A^ > 0. It follows that there exists K such that trA° > \ for all k > K. Notice 
that G\G\ is positive definite on T (and indeed coercive). Moreover, G\A\G\ < G\G\, since 
A° belongs to the unit ball. Therefore, 

liminf Jy(A k ) = liminf / tr [A k - log(J + GlA k Gi)} 

k— >oo k— >oo J 

log 



= liminf tr [||A fc ||A°l - liminf / tr 

k— s>oo k— >oo J 
7] f 

> liminf || Afe || liminf / log||A^| 

k— >ao 2 k— >oo I 



IA* 



lim inf 

k— >oo 



Ai 



tr 



log 



G\Aj,G\ 
1 



I A, I 



I + G*Gi 



lim inf — 

k— s>oo 2 



I A, 



An 



log||Afc|| I — liminf / tr 

fc— >oo 



log 



HA, 



-I + Gld 



+oo. 
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